Robust N-gram Based Syntactic Analysis Using Segmentation Words
نویسندگان
چکیده
We describe an N-gram based syntactic analysis using a dependency grammar. Instead of generalizing syntactic rules, N-gram information of parts of speech is used to segment a sequence of words into two clauses. A special part of speech, called segmentation word, which corresponds to the beginning or end symbol of clauses is introduced to express a sentence structure. Segmentation words for each clause were learned using the hill climbing method and a small bracketed corpus. Experimental results for Japanese sentences showed that N-gram based syntactic parser achieved 72.2% recall, which is about the same level of performance as a probabilistic context-free grammar based parser with human-made language-dependent information.
منابع مشابه
Robust Potato Color Image Segmentation using Adaptive Fuzzy Inference System
Potato image segmentation is an important part of image-based potato defect detection. This paper presents a robust potato color image segmentation through a combination of a fuzzy rule based system, an image thresholding based on Genetic Algorithm (GA) optimization and morphological operators. The proposed potato color image segmentation is robust against variation of background, distance and ...
متن کاملLanguage Model Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech
Language modeling has many applications in a large variety of domains. Performance of this model depends on its adaptation to a particular style of data. Accordingly, adaptation methods endeavour to apply syntactic and semantic characteristics of the language for language modeling. The previous adaptation methods such as family of Dirichlet class language model (DCLM) extract class of history w...
متن کاملWhich is More Suitable for Chinese Word Segmentation , the Generative Model or the Discriminative One ? F ∗
Since the traditional word-based n-gram model, a generative approach, cannot handle those out-of-vocabulary (OOV) words in the testing-set, the character-based discriminative approach has been widely adopted recently. However, this discriminative model, though is more robust to OOV words, fails to deliver satisfactory performance for those in-vocabulary (IV) words that have been observed before...
متن کاملA Portable And Quick Japanese Parser: QJP
QJP is a portable and quick softwaxe module for Japanese processing. QJP analyzes a Japanese sentence into segmented morphemes/words with tags and a syntactic bunsetsu kakari-uke structure based on the two strategies, a) Morphological analysis based on character-types and functional-words and b) Syntactic analysis by simple treatment of structural ambiguities and ignoring semantic information. ...
متن کاملIntegration of morphological and syntactic analysis based on LR parsing algorithm
Morphological analysis of Japanese is very different from that of English, because no spaces are placed between words. The analysis includes segmentation of words. However, ambiguities in segmentation is not always resolved only with morphological information. This paper proposes a method to integrate the morphological and syntactic analysis based on LR parsing algorithm. An LR table derived fr...
متن کامل